gh-139353: Add Objects/unicode_codecs_utf.c file#142190
gh-139353: Add Objects/unicode_codecs_utf.c file#142190vstinner wants to merge 2 commits intopython:mainfrom
Conversation
Rename functions: * ascii_decode() => _PyUnicode_DecodeASCII() * backslashreplace() => _PyUnicode_backslashreplace() * raise_encode_exception() => _PyUnicode_RaiseEncodeException() * unicode_decode_call_errorhandler_writer() => _PyUnicode_DecodeCallErrorHandler() * unicode_decode_utf8() => _PyUnicode_DecodeUTF8() * unicode_encode_call_errorhandler() => _PyUnicode_EncodeCallErrorHandler() * unicode_encode_utf8() => _PyUnicode_EncodeUTF8() * xmlcharrefreplace() => _PyUnicode_xmlcharrefreplace() Move static inline functions and macros to pycore_unicodeobject.h: * _PyUnicode_CHECK() * _PyUnicode_UTF8() * PyUnicode_UTF8() * PyUnicode_SET_UTF8() * PyUnicode_UTF8_LENGTH() * PyUnicode_SET_UTF8_LENGTH()
|
@serhiy-storchaka: What do you think of this split? |
|
I do not feel easy about this. The UTF codecs code is tightly coupled with other code. This PR makes some static function non-static, and exposes local functions in a header. This means that the compiler cannot completely inline them -- it needs to keep also a non-inlined copy, and this can affect its decision to inline them. This means that low level C API which was previously not intended to use outside of the unicodeobject.c file can now be used in other CPython code, at it will be used, for sure. This will also affect optimization and maintainability. If the goal of this change is to improve maintainability, I am not sure that its effect on maintainability is net positive. |
|
An alternative is to put all codecs in a single file: #141469 (6,671 lines of C code). |
Rename functions:
Move static inline functions and macros to pycore_unicodeobject.h: